Remove IterDataPipe from Inference pipeline #96

gitttt-1234 · 2024-09-27T16:52:32Z

This PR removes the Iterdatapipe dependency from the inference pipeline. Currently, we use IterDatapipe modules for creating the data pipeline from labels while running inference on .slp files. Since we're moving to Litdata (Related issue: #80), we're removing the IterDatapipe modules and implementing a thread-based approach, similar to our current VideoReader (Related issue: #26).

Summary by CodeRabbit

Release Notes

New Features
- Introduced LabelsReaderDP for enhanced data handling in pipelines.
- Added support for loading pre-trained model weights in ModelTrainer classes.
- New attributes in CentroidCrop class to utilize ground-truth centroids during inference.
Bug Fixes
- Adjusted expected output shapes in various tests to align with updated processing logic.
Documentation
- Updated docstrings and comments to reflect changes in data provider and processing methods.
Tests
- Refactored tests to use LabelsReaderDP and updated assertions based on new data handling logic.

coderabbitai · 2024-09-27T16:52:42Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The pull request introduces significant changes across multiple files, primarily focusing on updating the data provider from LabelsReader to LabelsReaderDP. This change affects various classes and methods, including CentroidConfmapsPipeline, BottomUpPipeline, and several test files. Additionally, new classes like VideoReader and modifications to the Predictor class enhance the functionality of the inference pipeline. The updates also include adjustments to expected output shapes in tests and the addition of parameters for loading pre-trained weights in the model trainer classes.

Changes

File	Change Summary
`sleap_nn/data/pipelines.py`	Updated `data_provider` type to `LabelsReaderDP` in `make_training_pipeline`. Added `centroids_confidence_maps` and `part_affinity_fields` to `keep_keys`.
`sleap_nn/data/providers.py`	Renamed `LabelsReader` to `LabelsReaderDP`. Introduced `VideoReader` class. Modified `from_filename` method and `__iter__` method in `LabelsReaderDP`.
`sleap_nn/data/resizing.py`	Updated `provider` type in `SizeMatcher` from `LabelsReader` to `LabelsReaderDP`.
`sleap_nn/inference/predictors.py`	Renamed `video_preprocess_config` to `preprocess_config`. Added `instances_key`. Updated `make_pipeline` method signatures and restructured `_predict_generator`.
`sleap_nn/inference/topdown.py`	Added `use_gt_centroids` and `anchor_ind` attributes to `CentroidCrop`. Modified constructor and `forward` method to handle ground-truth centroids.
`sleap_nn/training/model_trainer.py`	Added `trained_ckpts_path` parameter to constructors and methods in `ModelTrainer` and subclasses for loading pre-trained weights.
`tests/assets/minimal_instance/training_config.yaml`	Updated `prv_runid` field in `wandb` configuration from unspecified to `null`.
`tests/data/test_augmentation.py`	Changed import from `LabelsReader` to `LabelsReaderDP`.
`tests/data/test_confmaps.py`	Updated usage of `LabelsReader` to `LabelsReaderDP` in test functions.
`tests/data/test_edge_maps.py`	Replaced `LabelsReader` with `LabelsReaderDP` in import and test function.
`tests/data/test_instance_centroids.py`	Updated import and instantiation of `LabelsReader` to `LabelsReaderDP`.
`tests/data/test_instance_cropping.py`	Changed `LabelsReader` to `LabelsReaderDP` in test function.
`tests/data/test_normalization.py`	Updated import and instantiation of `LabelsReader` to `LabelsReaderDP`.
`tests/data/test_pipelines.py`	Replaced `LabelsReader` with `LabelsReaderDP` and adjusted expected output keys in tests.
`tests/data/test_providers.py`	Updated `LabelsReader` to `LabelsReaderDP` in tests. Added `test_labelsreader_provider` for new class.
`tests/data/test_resizing.py`	Changed import from `LabelsReader` to `LabelsReaderDP`.
`tests/fixtures/datasets.py`	Updated `provider` key from `"LabelsReader"` to `"LabelsReaderDP"` and modified model configuration parameters.
`tests/inference/test_bottomup.py`	Updated test function to load labels using `sio.load_slp` and simplified data handling.
`tests/inference/test_predictors.py`	Adjusted `peak_threshold` parameters and expected lengths in multiple tests.
`tests/inference/test_single_instance.py`	Simplified data handling in the test by removing the previous pipeline.
`tests/inference/test_topdown.py`	Refactored data handling and inference processes, focusing on direct data manipulation.
`tests/training/test_model_trainer.py`	Updated expected tensor shapes and added verification for loading trained weights in tests.

Possibly related PRs

LitData Refactor PR1: Get individual functions for data pipelines #90: Involves refactoring data pipelines and enhancing data handling, relevant to the changes in the make_training_pipeline method.
Add function to load trained weights for backbone model #95: Introduces functionality for loading trained weights, aligning with the changes in the main PR related to model training.

Suggested reviewers

talmo

🐇 In the code we hop and play,
With LabelsReaderDP leading the way.
Pipelines updated, tests in line,
Pre-trained weights, oh how they shine!
From CentroidCrop to Predictor bright,
Our models now leap to new heights! 🌟

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

codecov · 2024-09-28T01:51:04Z

Codecov Report

Attention: Patch coverage is 97.46835% with 4 lines in your changes missing coverage. Please review.

Project coverage is 97.44%. Comparing base (f093ce2) to head (b89af63).
Report is 14 commits behind head on main.

Files with missing lines	Patch %	Lines
sleap_nn/data/providers.py	95.55%	2 Missing ⚠️
sleap_nn/inference/predictors.py	97.56%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #96      +/-   ##
==========================================
+ Coverage   96.64%   97.44%   +0.79%     
==========================================
  Files          23       37      +14     
  Lines        1818     3559    +1741     
==========================================
+ Hits         1757     3468    +1711     
- Misses         61       91      +30

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sleap_nn/data/providers.py

sleap_nn/inference/predictors.py

sleap_nn/inference/topdown.py

talmo

You're still using LabelsReaderDP in a bunch of places in the test, but I think we want to test the new LabelsReader, no?

sleap_nn/data/resizing.py

coderabbitai

Caution

Inline review comments failed to post

Actionable comments posted: 27

🧹 Outside diff range and nitpick comments (26)

tests/inference/test_bottomup.py (3)
3-7: Consider removing unused imports

The imports for Path and shutil appear to be unused in the visible code. If they are not used elsewhere in this file, consider removing them to keep the imports clean and relevant.

If these imports are indeed unnecessary, you can apply the following diff:
-from pathlib import Path
-import shutil
import sleap_io as sio
from sleap_nn.data.providers import process_lf
from sleap_nn.data.normalization import apply_normalization
🧰 Tools

🪛 Ruff

3-3: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

4-4: shutil imported but unused

Remove unused import: shutil

(F401)

36-38: Approved: Direct data loading and processing

The new approach of loading labels with sleap_io and processing them directly aligns well with the PR objective of removing IterDataPipe. This change simplifies the test setup and makes it more straightforward.

For improved clarity, consider adding a brief comment explaining the purpose of the unsqueeze operation:
 labels = sio.load_slp(minimal_instance)
 ex = process_lf(labels[0], 0, 2)
-ex["image"] = apply_normalization(ex["image"]).unsqueeze(dim=0)
+# Add batch dimension to match model input requirements
+ex["image"] = apply_normalization(ex["image"]).unsqueeze(dim=0)
Line range hint 67-102: Approved: Comprehensive inference testing

The updated test performs inference directly on the processed example and includes comprehensive assertions to verify the output structure and dimensions. This aligns well with the PR objective and ensures the inference model is working as expected.

To improve readability, consider grouping related assertions and adding comments to explain their purpose:
 output = inference_layer(ex)[0]
-assert "confmaps" not in output.keys()
-assert output["pred_instance_peaks"].is_nested
-assert tuple(output["pred_instance_peaks"][0].shape)[1:] == (2, 2)
-assert tuple(output["pred_peak_values"][0].shape)[1:] == (2,)
+# Verify that confmaps are not returned when not requested
+assert "confmaps" not in output.keys()
+
+# Check structure and dimensions of predicted peaks
+assert output["pred_instance_peaks"].is_nested
+assert tuple(output["pred_instance_peaks"][0].shape)[1:] == (2, 2)
+assert tuple(output["pred_peak_values"][0].shape)[1:] == (2,)

 # ... (code for second inference) ...

 output = inference_layer(ex)[0]
-assert tuple(output["confmaps"].shape) == (1, 2, 192, 192)
-assert tuple(output["part_affinity_fields"].shape) == (1, 96, 96, 2)
-assert output["pred_instance_peaks"].is_nested
-assert output["peaks"][0].shape[-1] == 2
-assert tuple(output["pred_instance_peaks"][0].shape)[1:] == (2, 2)
-assert tuple(output["pred_peak_values"][0].shape)[1:] == (2,)
+# Verify dimensions of returned confmaps and part affinity fields
+assert tuple(output["confmaps"].shape) == (1, 2, 192, 192)
+assert tuple(output["part_affinity_fields"].shape) == (1, 96, 96, 2)
+
+# Check structure and dimensions of predicted peaks
+assert output["pred_instance_peaks"].is_nested
+assert output["peaks"][0].shape[-1] == 2
+assert tuple(output["pred_instance_peaks"][0].shape)[1:] == (2, 2)
+assert tuple(output["pred_peak_values"][0].shape)[1:] == (2,)
tests/data/test_resizing.py (2)
19-19: Improve variable naming for better readability.

While the change to use LabelsReaderDP is correct, the variable name l is ambiguous and could be improved for better code readability.

Consider renaming the variable to something more descriptive, like labels_reader:
-    l = LabelsReaderDP.from_filename(minimal_instance)
-    pipe = SizeMatcher(l, provider=l)
+    labels_reader = LabelsReaderDP.from_filename(minimal_instance)
+    pipe = SizeMatcher(labels_reader, provider=labels_reader)
🧰 Tools

🪛 Ruff

19-19: Ambiguous variable name: l

(E741)

52-52: Consistently improve variable naming throughout the file.

The ambiguous variable name l is used again. For consistency and improved readability, consider updating this and all similar occurrences in the file.

Apply this change here and in other similar instances:
-    l = LabelsReaderDP.from_filename(minimal_instance)
+    labels_reader = LabelsReaderDP.from_filename(minimal_instance)
🧰 Tools

🪛 Ruff

52-52: Ambiguous variable name: l

(E741)
tests/data/test_confmaps.py (1)
110-110: LGTM: Consistent update to data pipeline initialization.

The change from LabelsReader to LabelsReaderDP is consistent with the previous modifications and aligns with the PR objectives.

The static analysis tool flagged an unused import of numpy. Consider removing this import if it's not used elsewhere in the file:
-import numpy as np
tests/data/test_edge_maps.py (2)

Line range hint 193-199: LGTM! Consider adding tests for thread-based functionality.

The change from LabelsReader to LabelsReaderDP is consistent with the import statement modification and aligns with the PR objective of removing IterDataPipe.

While the current test covers the basic functionality, consider adding specific tests for the thread-based approach mentioned in the PR objectives. This would ensure that the new implementation behaves correctly in a multi-threaded environment.

Example test cases could include:

Concurrent access to LabelsReaderDP from multiple threads

Performance comparison between the old and new implementations

Thread safety of the LabelsReaderDP methods

Would you like assistance in drafting these additional test cases?

Line range hint 1-199: Summary: Changes align with PR objectives, but thread-based approach not visible.

The modifications in this file successfully replace LabelsReader with LabelsReaderDP, which aligns with the PR objective of removing IterDataPipe. However, the thread-based approach mentioned in the PR objectives is not visible in these changes.

Consider the following suggestions for further development:

Implement and test the thread-based approach mentioned in the PR objectives.

Update the docstring of the test_part_affinity_fields_generator function to reflect the use of LabelsReaderDP.

If LabelsReaderDP has any new features or behaviors different from LabelsReader, consider adding new test cases to cover these aspects.

These additions would ensure that the new implementation is thoroughly tested and documented, improving the overall quality and maintainability of the codebase.

sleap_nn/data/pipelines.py (2)

255-255: LGTM with a minor suggestion

The changes look good and align with the PR objective of removing the IterDataPipe dependency. The update to LabelsReaderDP in the docstring and the addition of centroids_confidence_maps to the keep_keys list are appropriate.

Consider adding a brief comment explaining the purpose of the centroids_confidence_maps key, especially if it's a new addition to the pipeline output. This would improve code readability and maintainability.

Also applies to: 279-279

356-356: LGTM with a suggestion for consistency

The changes are appropriate and align with the PR objective. The update to LabelsReaderDP in the docstring and the addition of part_affinity_fields to the keep_keys list are correct.

For consistency with the CentroidConfmapsPipeline class, consider adding a brief comment explaining the purpose of the part_affinity_fields key in the keep_keys list. This would improve code readability and maintain a consistent documentation style across pipeline classes.

Also applies to: 366-366

tests/inference/test_predictors.py (3)

87-93: Approve the introduction of max_instances variable

The introduction of the max_instances variable and its usage in the function call improves code readability and maintainability. This change allows for easier modification of the maximum instances value in the future.

Consider moving the max_instances variable to the top of the function or even making it a constant at the module level if it's used in multiple test functions. This would further improve the code's organization and make it easier to adjust this value across multiple tests if needed.

97-99: Approve the updated assertion using max_instances

The assertion has been correctly updated to use the max_instances variable, which aligns with the earlier changes. The use of <= in the assertion allows for flexibility in the number of instances, up to the maximum.

To improve clarity, consider adding a comment explaining why the number of centroids might be less than or equal to max_instances. This would help future readers understand the expected behavior more easily.

Line range hint 1-450: Summary of changes and recommendations

The changes in this file primarily involve adjustments to test parameters and expected outcomes for various predictor classes. Here are the key points that need attention:

The lowering of peak_threshold values in multiple tests may impact the number of detected instances. Verify that these changes align with the expected behavior of the predictors.

The significant increase in the expected number of labels (from 100 to 1100) in the test_single_instance_predictor function needs clarification, especially considering the addition of the videoreader_end_idx parameter limiting frames to 100.

The introduction of the max_instances variable improves code readability, but consider moving it to a more prominent location for better visibility.

Some assertions have been updated to reflect new output structures or use new variables. Ensure these changes accurately represent the expected behavior of the predictors.

Overall, while the changes seem to be part of updating the tests to match new predictor behaviors, it's crucial to verify that all modifications align with the intended functionality and performance of the predictors. Consider adding comments to explain significant changes in expected outcomes or test parameters to improve code maintainability and clarity for future readers.
tests/training/test_model_trainer.py (2)
73-73: Document new parameter and verify test setup

A new parameter minimal_instance_bottomup_ckpt: str has been added to the test_trainer function. This parameter is likely used for weight loading verification.

Please consider the following improvements:

Add a docstring to explain the purpose and expected format of the minimal_instance_bottomup_ckpt parameter.

Ensure that the test setup (e.g., in conftest.py or similar) provides this checkpoint file correctly.

Example docstring:
def test_trainer(config, tmp_path: str, minimal_instance_bottomup_ckpt: str):
    """
    Test the ModelTrainer class.

    Args:
        config: The configuration object for the test.
        tmp_path (str): Temporary path for saving test outputs.
        minimal_instance_bottomup_ckpt (str): Path to a minimal instance bottom-up model checkpoint for weight verification.
    """
    # ... (rest of the function)
280-297: Approve weight loading verification with a suggestion

The addition of weight loading verification is a good practice to ensure model consistency between saved checkpoints and loaded models.

Consider increasing the precision of the weight comparison:
-    assert np.all(np.abs(first_layer_ckpt - model_ckpt) < 1e-3)
+    assert np.allclose(first_layer_ckpt, model_ckpt, atol=1e-5, rtol=1e-5)
This change uses np.allclose() with stricter tolerance levels, which can catch smaller discrepancies in weight loading.
tests/inference/test_topdown.py (3)
85-85: Simplify dictionary key check

Instead of using key not in dict.keys(), use key not in dict for a more idiomatic and efficient check.

Apply this diff to simplify the key check:
-assert "instance_image" not in out.keys()
+assert "instance_image" not in out
🧰 Tools

🪛 Ruff

85-85: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)

113-120: Review commented-out code in data processing

The line # ex["centroids"] = generate_centroids(ex["instances"], 0) is commented out. If this code is no longer necessary, consider removing it. If it is needed, uncomment and ensure it functions correctly.

123-126: Improve parameter alignment in CentroidCrop

For better readability, align the parameters in the CentroidCrop instantiation vertically.

Apply this diff to adjust the alignment:
     topdown_inf_layer = TopDownInferenceModel(
-        centroid_crop=CentroidCrop(
-            use_gt_centroids=True, anchor_ind=0, crop_hw=(160, 160)
-        ),
+        centroid_crop=CentroidCrop(
+            use_gt_centroids=True,
+            anchor_ind=0,
+            crop_hw=(160, 160)
+        ),
         instance_peaks=FindInstancePeaksGroundTruth(),
     )
sleap_nn/data/providers.py (1)

Line range hint 155-186: Refactor to eliminate code duplication in data processing.

The logic for processing LabeledFrame objects into sample dictionaries is duplicated in both LabelsReaderDP.__iter__ and LabelsReader.run. Consider refactoring by utilizing the existing process_lf function to reduce code redundancy and improve maintainability.

Also applies to: 339-380
sleap_nn/training/model_trainer.py (3)
584-584: Simplify dictionary iteration by removing .keys()

When iterating over dictionary keys, it's more efficient to use for k in dict instead of for k in dict.keys().

Apply this diff to simplify:
        ckpt["state_dict"] = {
            k: ckpt["state_dict"][k]
-           for k in ckpt["state_dict"].keys()
+           for k in ckpt["state_dict"]
            if ".head" not in k
        }
🧰 Tools

🪛 Ruff

584-584: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

578-578: Address the TODO: Handle different input channels

There's a TODO comment indicating the need to handle different input channels. This is important for ensuring compatibility when loading pre-trained weights with models expecting different input dimensions.

Would you like assistance in implementing support for different input channels? I can help generate a solution or open a GitHub issue to track this task.

510-510: Use Optional[str] for trained_ckpts_path type annotation

The trained_ckpts_path parameter is currently annotated as str = None, which may cause confusion since None is not an instance of str. To improve type correctness, consider using Optional[str] = None for parameters that can be None.

Apply this diff to update the type annotations:
     def __init__(
         self,
         config: OmegaConf,
         skeletons: Optional[List[sio.Skeleton]],
         model_type: str,
-        trained_ckpts_path: str = None,
+        trained_ckpts_path: Optional[str] = None,
     ):
Apply similar changes in all occurrences where trained_ckpts_path is defined.

Also applies to: 692-692, 765-765, 838-838, 911-911
sleap_nn/inference/predictors.py (3)

13-13: Remove unused import 'litdata'

The import litdata is not used in the code and can be safely removed to clean up unnecessary dependencies.

🧰 Tools

🪛 Ruff

13-13: litdata imported but unused

Remove unused import: litdata

(F401)

19-19: Remove unused import 'apply_sizematcher'

The function apply_sizematcher is imported but not used. Consider removing it to maintain code cleanliness.

🧰 Tools

🪛 Ruff

19-19: sleap_nn.data.resizing.apply_sizematcher imported but unused

Remove unused import: sleap_nn.data.resizing.apply_sizematcher

(F401)

27-28: Remove unused imports 'generate_centroids' and 'generate_crops'

The functions generate_centroids and generate_crops are imported but not utilized in the code. Removing unused imports can help in reducing clutter.

🧰 Tools

🪛 Ruff

27-27: sleap_nn.data.instance_centroids.generate_centroids imported but unused

Remove unused import: sleap_nn.data.instance_centroids.generate_centroids

(F401)

28-28: sleap_nn.data.instance_cropping.generate_crops imported but unused

Remove unused import: sleap_nn.data.instance_cropping.generate_crops

(F401)
sleap_nn/inference/topdown.py (1)
531-533: Move eval() call outside the loop

You're calling self.instance_peaks.eval() inside the loop. To avoid redundant calls, move it outside the loop.

Apply this diff to refactor the code:
+        self.instance_peaks.eval()
         for i in batch:
-            self.instance_peaks.eval()
             peaks_output.append(self.instance_peaks(i))

🛑 Comments failed to post (27)

tests/data/test_normalization.py (1)
9-9: 💡 Codebase verification

Incomplete removal of LabelsReader imports and usages.

Several files still import and use LabelsReader, which prevents the complete removal of the IterDataPipe dependency as intended by the PR. Please update the following files to replace LabelsReader with LabelsReaderDP:

sleap_nn/inference/predictors.py

sleap_nn/data/resizing.py

tests/data/test_providers.py

tests/data/test_normalization.py

tests/data/test_instance_cropping.py

tests/data/test_instance_centroids.py

tests/data/test_edge_maps.py

tests/data/test_confmaps.py

tests/data/test_augmentation.py

tests/data/test_pipelines.py

🔗 Analysis chain

Approve import change and verify consistency.

The change from LabelsReader to LabelsReaderDP aligns with the PR objective to remove IterDataPipe dependency. This likely implements the new thread-based approach.

Let's verify if this change is consistent across the codebase:
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for any remaining imports of LabelsReader and usages of LabelsReaderDP

# Search for any remaining imports of LabelsReader
echo "Checking for remaining LabelsReader imports:"
rg --type python "from sleap_nn.data.providers import LabelsReader"

# Search for new imports of LabelsReaderDP
echo "Checking for new LabelsReaderDP imports:"
rg --type python "from sleap_nn.data.providers import LabelsReaderDP"

# Search for any remaining usage of LabelsReader
echo "Checking for remaining LabelsReader usage:"
rg --type python "LabelsReader\."

# Search for new usage of LabelsReaderDP
echo "Checking for new LabelsReaderDP usage:"
rg --type python "LabelsReaderDP\."
Length of output: 470

Script:
#!/bin/bash
# Description: Check for any remaining imports of LabelsReader and usages of LabelsReaderDP

# Search for any remaining imports of LabelsReader
echo "Checking for remaining LabelsReader imports:"
rg -g "*.py" "from sleap_nn\.data\.providers import LabelsReader"

# Search for new imports of LabelsReaderDP
echo "Checking for new LabelsReaderDP imports:"
rg -g "*.py" "from sleap_nn\.data\.providers import LabelsReaderDP"

# Search for any remaining usage of LabelsReader
echo "Checking for remaining LabelsReader usage:"
rg -g "*.py" "LabelsReader\."

# Search for new usage of LabelsReaderDP
echo "Checking for new LabelsReaderDP usage:"
rg -g "*.py" "LabelsReaderDP\."
Length of output: 4193
tests/data/test_instance_cropping.py (1)
13-13: 💡 Codebase verification

Remove remaining imports of LabelsReader.

The shell scripts indicate that LabelsReader is still imported in the following file:

sleap_nn/inference/predictors.py:from sleap_nn.data.providers import LabelsReader, VideoReader

Please update this import to LabelsReaderDP to align with the PR objectives.

🔗 Analysis chain

Approve import change and verify its impact.

The import change from LabelsReader to LabelsReaderDP aligns with the PR objective of removing IterDataPipe. This change appears to be part of the transition to a new system.

To ensure this change is consistent across the codebase, please run the following script:
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the usage of LabelsReaderDP and absence of LabelsReader

# Test 1: Check for any remaining imports of LabelsReader
echo "Checking for remaining LabelsReader imports:"
rg --type python 'from sleap_nn\.data\.providers import LabelsReader'

# Test 2: Verify the correct import and usage of LabelsReaderDP
echo "Verifying LabelsReaderDP usage:"
rg --type python 'from sleap_nn\.data\.providers import LabelsReaderDP'
rg --type python 'LabelsReaderDP\.from_filename'
Length of output: 461

Script:
#!/bin/bash
# Description: Verify the usage of LabelsReaderDP and absence of LabelsReader

# Test 1: Check for any remaining imports of LabelsReader
echo "Checking for remaining LabelsReader imports:"
rg --type py 'from sleap_nn\.data\.providers import LabelsReader'

# Test 2: Verify the correct import and usage of LabelsReaderDP
echo "Verifying LabelsReaderDP usage:"
rg --type py 'from sleap_nn\.data\.providers import LabelsReaderDP'
rg --type py 'LabelsReaderDP\.from_filename'
Length of output: 3724
tests/data/test_augmentation.py (1)
13-13: 💡 Codebase verification

Remaining LabelsReader Imports Found

The following files still import LabelsReader and should be updated to use LabelsReaderDP:

sleap_nn/inference/predictors.py

Please update these imports to ensure consistency and complete the removal of the IterDataPipe dependency.

🔗 Analysis chain

LGTM! Verify other imports across the codebase.

The import change from LabelsReader to LabelsReaderDP aligns with the PR objective of removing the IterDataPipe dependency. This change is correct and follows Python conventions.

To ensure consistency across the codebase, please run the following script to check for any remaining LabelsReader imports:

If the script returns any results, those files may need to be updated to use LabelsReaderDP instead.

Verification script encountered errors. Please run the following corrected script to check for any remaining LabelsReader imports:

Verification script encountered errors. Please run the following corrected script to check for any remaining LabelsReader imports:
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for any remaining LabelsReader imports

# Test: Search for LabelsReader imports
rg --type python 'from sleap_nn.data.providers import LabelsReader'
Length of output: 321

Script:
#!/bin/bash
# Description: Check for any remaining LabelsReader imports

# Test: Search for LabelsReader imports
rg --type python 'from sleap_nn.data.providers import LabelsReader' || echo "No remaining LabelsReader imports found."
Length of output: 191

Script:
#!/bin/bash
# Description: Check for any remaining LabelsReader imports

# Test: Search for LabelsReader imports using the correct file type
rg --type py 'from sleap_nn.data.providers import LabelsReader' || echo "No remaining LabelsReader imports found."
Length of output: 972
tests/data/test_resizing.py (1)

68-68: 🛠️ Refactor suggestion

Refactor ambiguous variable names file-wide.

The ambiguous variable name l is used consistently throughout the file. To improve code readability and maintainability, it's recommended to refactor all instances of this variable name.

Would you like me to create a script to automate this change across the file? The script would replace all instances of l = LabelsReaderDP.from_filename(...) with labels_reader = LabelsReaderDP.from_filename(...).

🧰 Tools

🪛 Ruff

68-68: Ambiguous variable name: l

(E741)
tests/data/test_confmaps.py (1)
22-22: 💡 Codebase verification

Issue: Remaining instances of LabelsReader detected.

Several instances of LabelsReader remain in the codebase. Please update them to LabelsReaderDP to ensure consistency and maintain compatibility.

tests/inference/test_predictors.py

tests/tracking/test_tracker.py

tests/tracking/candidates/test_local_queues.py

tests/tracking/candidates/test_fixed_window.py

tests/data/test_providers.py

sleap_nn/data/providers.py

sleap_nn/inference/predictors.py

sleap_nn/data/pipelines.py

🔗 Analysis chain

LGTM: Data pipeline initialization updated.

The change from LabelsReader to LabelsReaderDP is consistent with the PR objectives. The function signature remains the same, which is good for maintaining compatibility.

To ensure the new implementation is fully compatible, please run the following verification script:
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that all occurrences of LabelsReader have been replaced with LabelsReaderDP

# Test: Search for any remaining instances of LabelsReader
rg --type python 'LabelsReader(?!DP)' -g '!tests/data/test_confmaps.py'

# Note: The search excludes the current file to avoid false positives from the changes we're reviewing.
# If this search returns results, it may indicate places where LabelsReader still needs to be updated.
Length of output: 207

Script:
#!/bin/bash
# Description: Verify that all occurrences of LabelsReader have been replaced with LabelsReaderDP

# Test: Search for any remaining instances of LabelsReader not followed by DP
rg --type python 'LabelsReader(?!DP)' --ignore-file '!tests/data/test_confmaps.py'

# Note: The search excludes the current file to avoid false positives from the changes we're reviewing.
# If this search returns results, it may indicate places where LabelsReader still needs to be updated.
Length of output: 318

Script:
#!/bin/bash
# Description: Verify that all occurrences of LabelsReader have been replaced with LabelsReaderDP

# Test: Search for any remaining instances of LabelsReader not followed by DP using PCRE2 for look-around support
rg --pcre2 --type python 'LabelsReader(?!DP)' --ignore-file '!tests/data/test_confmaps.py'

# Note: The search excludes the current file to avoid false positives from the changes we're reviewing.
# If this search returns results, it may indicate places where LabelsReader still needs to be updated.
Length of output: 192

Script:
#!/bin/bash
# Description: Verify that all occurrences of LabelsReader have been replaced with LabelsReaderDP

# Test: Search for any remaining instances of LabelsReader not followed by DP using PCRE2 for look-around support
rg --pcre2 'LabelsReader(?!DP)' --type py -g '!tests/data/test_confmaps.py'

# Note: The search excludes the current file to avoid false positives from the changes we're reviewing.
# If this search returns results, it may indicate places where LabelsReader still needs to be updated.
Length of output: 4395
sleap_nn/data/resizing.py (1)
7-7: 💡 Codebase verification

IterDataPipe is still in use and should not be removed

The IterDataPipe is still imported and used in the following classes within sleap_nn/data/resizing.py:

Resizer(IterDataPipe)

PadToStride(IterDataPipe)

SizeMatcher(IterDataPipe)

If the intention is to remove IterDataPipe, please refactor these classes accordingly. Otherwise, consider retaining the import to maintain existing functionality.
🔗 Analysis chain

Approve change, but question IterDataPipe usage

The change from LabelsReader to LabelsReaderDP aligns with the PR objective of transitioning away from IterDataPipe. However, I noticed that IterDataPipe is still imported on line 9.

Could you clarify if IterDataPipe is still needed in this file? If not, consider removing the import:
-from torch.utils.data.datapipes.datapipe import IterDataPipe
To verify the usage of IterDataPipe, let's run the following script:

This will help us determine if there are any classes still inheriting from IterDataPipe in this file.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for IterDataPipe usage in the file

# Test: Search for IterDataPipe usage
rg --type python 'class.*$IterDataPipe$' sleap_nn/data/resizing.py
Length of output: 187

Script:
#!/bin/bash
# Description: Check for IterDataPipe usage in sleap_nn/data/resizing.py

ast-grep --lang python --pattern 'class $_(IterDataPipe) { $$$ }' sleap_nn/data/resizing.py
Length of output: 546

Script:
#!/bin/bash
# Description: Check for IterDataPipe usage in sleap_nn/data/resizing.py

# Search for 'IterDataPipe' in the specified file
grep -E 'IterDataPipe' sleap_nn/data/resizing.py
Length of output: 997
sleap_nn/data/pipelines.py (1)
255-255: 💡 Codebase verification

Incomplete Removal of LabelsReader: Multiple Instances Still Present

The verification revealed that LabelsReader is still being used in several parts of the codebase:

Test Files:

tests/inference/test_predictors.py

tests/tracking/candidates/test_fixed_window.py

tests/tracking/test_tracker.py

tests/tracking/candidates/test_local_queues.py

tests/data/test_providers.py

Implementation Files:

sleap_nn/data/providers.py

sleap_nn/inference/predictors.py

Documentation:

sleap_nn/data/pipelines.py (in docstrings)

To fully align with the PR objective of removing the IterDataPipe dependency, please update all instances of LabelsReader to LabelsReaderDP.

🔗 Analysis chain

Summary: Changes align with PR objective and maintain consistency

The modifications to CentroidConfmapsPipeline and BottomUpPipeline classes are consistent and appropriate:

Both classes now use LabelsReaderDP instead of LabelsReader in their docstrings, aligning with the PR objective of removing IterDataPipe dependency.

New keys (centroids_confidence_maps and part_affinity_fields) are added to the respective keep_keys lists, ensuring that the pipeline outputs include the necessary data.

These changes appear to successfully update the data pipeline to work with the new data provider while maintaining the required functionality.

To ensure that these changes are consistent throughout the codebase, please run the following script:

This script will help identify any inconsistencies in the usage of LabelsReaderDP and the new keys across the codebase.

Also applies to: 279-279, 356-356, 366-366

To address the issues found during script execution, please run the updated verification script below:

This updated script includes the --pcre2 flag to support look-around in the first rg command and corrects the file type to py in both commands. Running this should provide the necessary information to verify the changes effectively.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the usage of LabelsReaderDP and the presence of new keys in other pipeline classes.

# Check for any remaining instances of LabelsReader
echo "Checking for remaining instances of LabelsReader:"
rg "LabelsReader(?!DP)" --type python

# Check for consistent usage of new keys in keep_keys lists
echo "Checking for consistent usage of new keys in keep_keys lists:"
rg "keep_keys\s*=\s*\[" --type python -A 10
Length of output: 592

Script:
# 
#!/bin/bash
# Description: Verify the usage of LabelsReaderDP and the presence of new keys in other pipeline classes.

# Check for any remaining instances of LabelsReader using PCRE2 for look-around
echo "Checking for remaining instances of LabelsReader:"
rg --pcre2 "LabelsReader(?!DP)" --type py

# Check for consistent usage of new keys in keep_keys lists
echo "Checking for consistent usage of new keys in keep_keys lists:"
rg "keep_keys\s*=\s*\[" --type py -A 10
Length of output: 7075
tests/data/test_pipelines.py (1)
566-566: 💡 Codebase verification

Issues Found: Incomplete Replacement of LabelsReader with LabelsReaderDP.

The following references to LabelsReader were found and need to be updated:

tests/tracking/test_tracker.py: provider="LabelsReader"

tests/tracking/candidates/test_fixed_window.py: provider="LabelsReader"

tests/tracking/candidates/test_local_queues.py: provider="LabelsReader"

tests/inference/test_predictors.py: Multiple instances of provider="LabelsReader" and related comments

tests/assets/minimal_instance_bottomup/training_config.yaml: provider: LabelsReader

tests/assets/minimal_instance_centroid/training_config.yaml: provider: LabelsReader

tests/data/test_providers.py: Multiple instances including class instantiations and docstrings

tests/assets/minimal_instance/initial_config.yaml: provider: LabelsReader

tests/assets/minimal_instance/training_config.yaml: provider: LabelsReader

tests/assets/minimal_instance_centroid/initial_config.yaml: provider: LabelsReader

tests/assets/minimal_instance_bottomup/initial_config.yaml: provider: LabelsReader

Please ensure all instances of LabelsReader are replaced with LabelsReaderDP to maintain consistency and functionality.

🔗 Analysis chain

LGTM: LabelsReader consistently replaced with LabelsReaderDP.

These changes are consistent with the previous modifications to use LabelsReaderDP throughout the file. All constructor calls remain unchanged, indicating straightforward replacements.

To ensure the LabelsReaderDP implementation is complete and consistent, run the following script:

Also applies to: 610-610, 688-688, 768-768
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the LabelsReaderDP implementation

# Check for any remaining references to LabelsReader
rg "LabelsReader(?!DP)" tests/

# Compare the method signatures of LabelsReader and LabelsReaderDP
rg -U --multiline "class LabelsReader.*?:.*?\n.*?def" tests/
rg -U --multiline "class LabelsReaderDP.*?:.*?\n.*?def" tests/
Length of output: 489

Script:
#!/bin/bash
# Description: Verify the LabelsReaderDP implementation with corrected commands

# Check for any remaining references to LabelsReader not followed by DP using PCRE2
rg --pcre2 "LabelsReader(?!DP)" tests/

# Compare the method signatures of LabelsReader and LabelsReaderDP using ast-grep
ast-grep --lang python --pattern $'class LabelsReaderDP:
    $$$
    def $$$' tests/

ast-grep --lang python --pattern $'class LabelsReader:
    $$$
    def $$$' tests/
Length of output: 2529
tests/inference/test_single_instance.py (2)
84-84: ⚠️ Potential issue

Simplify membership test in assertion

Instead of checking 'key in dict.keys()', you can directly check 'key in dict' for a more concise and Pythonic approach.

Apply this diff to fix the issue:
-        assert "pred_confmaps" in outputs[0][0].keys()
+        assert "pred_confmaps" in outputs[0][0]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    assert "pred_confmaps" in outputs[0][0]
🧰 Tools

🪛 Ruff

84-84: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

71-72: ⚠️ Potential issue

Avoid in-place modification of ex["image"]

Modifying ex["image"] in place may affect other tests or assertions that rely on the original image. Consider creating a copy of the example or the image before resizing to prevent unintended side effects.

Apply this diff to fix the issue:
-        ex["image"] = resize_image(ex["image"], 0.5)
+        ex_resized = ex.copy()
+        ex_resized["image"] = resize_image(ex_resized["image"], 0.5)
+        outputs.append(find_peaks_layer(ex_resized))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    ex_resized = ex.copy()
    ex_resized["image"] = resize_image(ex_resized["image"], 0.5)
    outputs.append(find_peaks_layer(ex_resized))
tests/data/test_providers.py (5)
110-110: ⚠️ Potential issue

Unused loop variable i should be replaced with _

The loop variable i is not used within the loop body. To indicate that it's intentionally unused, consider renaming it to _.

Apply this diff:
         data = []
-        for i in range(batch_size):
+        for _ in range(batch_size):
             frame = reader.frame_buffer.get()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
        data = []
        for _ in range(batch_size):
            frame = reader.frame_buffer.get()
🧰 Tools

🪛 Ruff

110-110: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

142-142: 🛠️ Refactor suggestion

Avoid using bare except: statements

As above, consider specifying the exception type to avoid catching unintended exceptions.

Apply this diff:
     except:
+    except Exception:
         raise
Committable suggestion was skipped due to low confidence.

133-133: ⚠️ Potential issue

Unused loop variable i should be replaced with _

Similarly, in this loop, the variable i is not used. Renaming it to _ improves code readability.

Apply this diff:
         data = []
-        for i in range(batch_size):
+        for _ in range(batch_size):
             frame = reader.frame_buffer.get()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
        data = []
        for _ in range(batch_size):
             frame = reader.frame_buffer.get()
🧰 Tools

🪛 Ruff

133-133: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

119-119: 🛠️ Refactor suggestion

Avoid using bare except: statements

Using a bare except: can catch unexpected exceptions, including system-exiting exceptions. It's better to catch specific exceptions or use except Exception: to handle standard errors.

Apply this diff:
     except:
+    except Exception:
         raise
Committable suggestion was skipped due to low confidence.

18-18: ⚠️ Potential issue

Rename variable l to a more descriptive name

The variable l in line 18 is a single-character variable name, which can be ambiguous and reduce code readability. It's recommended to use more descriptive variable names for better clarity.

Apply this diff to rename the variable:
 def test_providers(minimal_instance):
     """Test LabelsReaderDP module."""
-    l = LabelsReaderDP.from_filename(minimal_instance)
+    labels_reader = LabelsReaderDP.from_filename(minimal_instance)
     sample = next(iter(labels_reader))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    labels_reader = LabelsReaderDP.from_filename(minimal_instance)
🧰 Tools

🪛 Ruff

18-18: Ambiguous variable name: l

(E741)
tests/inference/test_topdown.py (4)
11-11: ⚠️ Potential issue

Remove unused imports Resizer and PadToStride

The imports Resizer and PadToStride from sleap_nn.data.resizing are not used in the code and should be removed to reduce clutter.

Apply this diff to remove the unused imports:
-from sleap_nn.data.resizing import SizeMatcher, Resizer, PadToStride
+from sleap_nn.data.resizing import SizeMatcher
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from sleap_nn.data.resizing import SizeMatcher
🧰 Tools

🪛 Ruff

11-11: sleap_nn.data.resizing.Resizer imported but unused

Remove unused import

(F401)

11-11: sleap_nn.data.resizing.PadToStride imported but unused

Remove unused import

(F401)

8-8: ⚠️ Potential issue

Remove unused import resize_image

The import resize_image from sleap_nn.data.resizing is not utilized in the code and should be removed to clean up the imports.

Apply this diff to remove the unused import:
-from sleap_nn.data.resizing import resize_image
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
🧰 Tools

🪛 Ruff

8-8: sleap_nn.data.resizing.resize_image imported but unused

Remove unused import: sleap_nn.data.resizing.resize_image

(F401)

12-12: ⚠️ Potential issue

Remove unused import InstanceCropper

The import InstanceCropper from sleap_nn.data.instance_cropping is not utilized and should be removed to clean up the code.

Apply this diff to remove the unused import:
-from sleap_nn.data.instance_cropping import InstanceCropper, generate_crops
+from sleap_nn.data.instance_cropping import generate_crops
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from sleap_nn.data.instance_cropping import generate_crops
🧰 Tools

🪛 Ruff

12-12: sleap_nn.data.instance_cropping.InstanceCropper imported but unused

Remove unused import: sleap_nn.data.instance_cropping.InstanceCropper

(F401)

9-9: ⚠️ Potential issue

Remove unused import InstanceCentroidFinder

The import InstanceCentroidFinder from sleap_nn.data.instance_centroids is not used and can be removed to streamline the code.

Apply this diff to remove the unused import:
-from sleap_nn.data.instance_centroids import InstanceCentroidFinder, generate_centroids
+from sleap_nn.data.instance_centroids import generate_centroids
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from sleap_nn.data.instance_centroids import generate_centroids
🧰 Tools

🪛 Ruff

9-9: sleap_nn.data.instance_centroids.InstanceCentroidFinder imported but unused

Remove unused import: sleap_nn.data.instance_centroids.InstanceCentroidFinder

(F401)
sleap_nn/data/providers.py (3)
255-257: ⚠️ Potential issue

Correct the docstring to reflect the method functionality.

The from_filename method in VideoReader is intended to load a video file, but the docstring incorrectly mentions a .slp file, which is a labels file. Update the docstring to accurately describe the method's purpose.

Apply this diff:
     @classmethod
     def from_filename(
         cls,
         filename: str,
         queue_maxsize: int,
         start_idx: Optional[int] = None,
         end_idx: Optional[int] = None,
     ):
-        """Create VideoReader from a .slp filename."""
+        """Create VideoReader from a video filename."""
         video = sio.load_video(filename)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
        """Create VideoReader from a video filename."""
        video = sio.load_video(filename)
        frame_buffer = Queue(maxsize=queue_maxsize)
383-385: 🛠️ Refactor suggestion

Use logging instead of print statements for error reporting.

Similarly, in LabelsReader.run, replace print statements with logging.error for consistent and configurable error logging.

Apply this diff:
 # At the top of the file, add:
+import logging

 # Inside the exception handling block:
 except Exception as e:
-    print(f"Error when reading labelled frame. Stopping labels reader.\n{e}")
+    logging.error(f"Error when reading labelled frame. Stopping labels reader.\n{e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import logging

        except Exception as e:
            logging.error(f"Error when reading labelled frame. Stopping labels reader.\n{e}")
280-282: 🛠️ Refactor suggestion

Use logging instead of print statements for error reporting.

In the exception handling of VideoReader.run, replace print statements with the logging module for better logging practices and configurable error handling.

Apply this diff:
 # At the top of the file, add:
+import logging

 # Inside the exception handling block:
 except Exception as e:
-    print(f"Error when reading video frame. Stopping video reader.\n{e}")
+    logging.error(f"Error when reading video frame. Stopping video reader.\n{e}")
Committable suggestion was skipped due to low confidence.
sleap_nn/training/model_trainer.py (1)
579-588: ⚠️ Potential issue

Security Risk: Unrestricted unpickling with torch.load can lead to code execution

The use of torch.load(trained_ckpts_path) without proper safeguards can pose a security risk. Unpickling data from untrusted sources can execute arbitrary code. Ensure that trained_ckpts_path comes from a trusted source, and consider using a more secure method for loading model weights.

Additionally, it's important to add error handling to manage cases where the checkpoint file might be missing or corrupted.

Apply the following changes to enhance security and error handling:
+import pickle
+import errno
...
    if trained_ckpts_path is not None:
        print(f"Loading weights from `{trained_ckpts_path}` ...")
+       if not os.path.isfile(trained_ckpts_path):
+           raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), trained_ckpts_path)
        try:
-           ckpt = torch.load(trained_ckpts_path)
+           with open(trained_ckpts_path, 'rb') as f:
+               ckpt = torch.load(f, map_location=self.device, pickle_module=pickle)
        except Exception as e:
            raise RuntimeError(f"Error loading checkpoint: {e}")
        ckpt["state_dict"] = {
            k: ckpt["state_dict"][k]
            for k in ckpt["state_dict"]
            if ".head" not in k
        }
        self.load_state_dict(ckpt["state_dict"], strict=False)
Committable suggestion was skipped due to low confidence.

🧰 Tools

🪛 Ruff

584-584: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)
sleap_nn/inference/predictors.py (3)
634-634: ⚠️ Potential issue

Use ValueError instead of Exception for invalid provider

Raising a ValueError is more appropriate when indicating an invalid argument value. It provides clearer information about the nature of the error.

Apply this diff to improve error handling:
- raise Exception(
+ raise ValueError(
    "Provider not recognised. Please use either `LabelsReader` or `VideoReader` as provider"
)
Committable suggestion was skipped due to low confidence.

1244-1244: ⚠️ Potential issue

Use ValueError instead of Exception for invalid provider

Switching to ValueError for invalid argument values aligns with Python's standard exception handling practices.

Apply this diff to improve error handling:
- raise Exception(
+ raise ValueError(
    "Provider not recognised. Please use either `LabelsReader` or `VideoReader` as provider"
)
Committable suggestion was skipped due to low confidence.

305-305: ⚠️ Potential issue

Call self.pipeline.stop() before self.pipeline.join()

To ensure proper termination of the pipeline threads, call self.pipeline.stop() before joining. This allows the pipeline to cease operations gracefully before the main thread waits for it to finish.

Apply this diff to address the issue:
  # At the end of _predict_generator method
+ self.pipeline.stop()
  self.pipeline.join()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
        self.pipeline.stop()
        self.pipeline.join()
sleap_nn/inference/topdown.py (1)
518-523: ⚠️ Potential issue

Simplify nested if statements

You can simplify the nested if statements into a single condition.

Apply this diff to streamline the code:
-if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth):
-    if "instances" not in batch:
-        raise ValueError(
-            "Ground truth data was not detected... "
-            "Please load both models when predicting on non-ground-truth data."
-        )
+if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth) and "instances" not in batch:
+    raise ValueError(
+        "Ground truth data was not detected... "
+        "Please load both models when predicting on non-ground-truth data."
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
        if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth) and "instances" not in batch:
            raise ValueError(
                "Ground truth data was not detected... "
                "Please load both models when predicting on non-ground-truth data."
            )
🧰 Tools

🪛 Ruff

518-519: Use a single if statement instead of nested if statements

(SIM102)

gitttt-1234 changed the base branch from main to divya/load-backbone-weights September 27, 2024 16:52

gitttt-1234 requested a review from talmo September 28, 2024 01:40

gitttt-1234 marked this pull request as ready for review September 28, 2024 01:40

talmo requested changes Oct 1, 2024

View reviewed changes

gitttt-1234 requested a review from talmo October 2, 2024 02:09

talmo approved these changes Oct 3, 2024

View reviewed changes

sleap_nn/data/resizing.py Show resolved Hide resolved

sleap_nn/data/resizing.py Show resolved Hide resolved

gitttt-1234 force-pushed the divya/load-backbone-weights branch from 2d2c737 to 1e83d76 Compare October 3, 2024 20:35

gitttt-1234 changed the base branch from divya/load-backbone-weights to main October 3, 2024 21:25

gitttt-1234 changed the base branch from main to divya/load-backbone-weights October 3, 2024 21:26

gitttt-1234 changed the base branch from divya/load-backbone-weights to main October 3, 2024 21:28

gitttt-1234 force-pushed the divya/remove-torchdata-inference branch from cf83eee to 0da9ad2 Compare October 3, 2024 21:48

gitttt-1234 changed the base branch from main to divya/load-backbone-weights October 3, 2024 21:48

coderabbitai bot reviewed Oct 3, 2024

View reviewed changes

gitttt-1234 force-pushed the divya/remove-torchdata-inference branch from 0da9ad2 to 80023b3 Compare October 3, 2024 22:05

gitttt-1234 and others added 11 commits October 3, 2024 15:31

Add litdata to trainer

1c51d26

Add tests for data loaderS

d724041

Fix tests

65407ac

Remove files in trainer

3841955

Remove shutil.rmtree

493ebbc

Remove iterdatapipe in inference

676f28f

Generate gt crops in CentroidCrop

a138f01

Fix tracker tests

a7d8e22

Fix frame index

3ab8007

Fix changes

2d6bc83

Update test_model_trainer.py

b89af63

gitttt-1234 force-pushed the divya/remove-torchdata-inference branch from 7a2cce7 to b89af63 Compare October 3, 2024 22:31

gitttt-1234 changed the base branch from divya/load-backbone-weights to main October 3, 2024 22:31

gitttt-1234 merged commit 2f0fedc into main Oct 3, 2024
7 checks passed

gitttt-1234 deleted the divya/remove-torchdata-inference branch October 3, 2024 22:57

gitttt-1234 restored the divya/remove-torchdata-inference branch October 3, 2024 23:08

gitttt-1234 deleted the divya/remove-torchdata-inference branch October 3, 2024 23:39

gitttt-1234 mentioned this pull request Oct 5, 2024

Fix sizematcher in Inference data pipline #102

Merged

coderabbitai bot mentioned this pull request Feb 20, 2025

Greg/add logger #148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove IterDataPipe from Inference pipeline #96

Remove IterDataPipe from Inference pipeline #96

gitttt-1234 commented Sep 27, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 27, 2024 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

codecov bot commented Sep 28, 2024 •

edited

Loading

talmo left a comment

coderabbitai bot left a comment

Remove IterDataPipe from Inference pipeline #96

Remove IterDataPipe from Inference pipeline #96

Conversation

gitttt-1234 commented Sep 27, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Sep 27, 2024 • edited Loading

Review skipped

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

codecov bot commented Sep 28, 2024 • edited Loading

Codecov Report

talmo left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

gitttt-1234 commented Sep 27, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 27, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Sep 28, 2024 •

edited

Loading